Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Liu

Augmented Deep Contexts for Spatially Embedded Video Coding

May 08, 2025

Yifan Bian, Chuanbo Tang, Li Li, Dong Liu

Abstract:Most Neural Video Codecs (NVCs) only employ temporal references to generate temporal-only contexts and latent prior. These temporal-only NVCs fail to handle large motions or emerging objects due to limited contexts and misaligned latent prior. To relieve the limitations, we propose a Spatially Embedded Video Codec (SEVC), in which the low-resolution video is compressed for spatial references. Firstly, our SEVC leverages both spatial and temporal references to generate augmented motion vectors and hybrid spatial-temporal contexts. Secondly, to address the misalignment issue in latent prior and enrich the prior information, we introduce a spatial-guided latent prior augmented by multiple temporal latent representations. At last, we design a joint spatial-temporal optimization to learn quality-adaptive bit allocation for spatial references, further boosting rate-distortion performance. Experimental results show that our SEVC effectively alleviates the limitations in handling large motions or emerging objects, and also reduces 11.9% more bitrate than the previous state-of-the-art NVC while providing an additional low-resolution bitstream. Our code and model are available at https://github.com/EsakaK/SEVC.

* 15 pages,CVPR

Via

Access Paper or Ask Questions

Physics-Driven Neural Compensation For Electrical Impedance Tomography

Apr 28, 2025

Chuyu Wang, Huiting Deng, Dong Liu

Abstract:Electrical Impedance Tomography (EIT) provides a non-invasive, portable imaging modality with significant potential in medical and industrial applications. Despite its advantages, EIT encounters two primary challenges: the ill-posed nature of its inverse problem and the spatially variable, location-dependent sensitivity distribution. Traditional model-based methods mitigate ill-posedness through regularization but overlook sensitivity variability, while supervised deep learning approaches require extensive training data and lack generalization. Recent developments in neural fields have introduced implicit regularization techniques for image reconstruction, but these methods typically neglect the physical principles underlying EIT, thus limiting their effectiveness. In this study, we propose PhyNC (Physics-driven Neural Compensation), an unsupervised deep learning framework that incorporates the physical principles of EIT. PhyNC addresses both the ill-posed inverse problem and the sensitivity distribution by dynamically allocating neural representational capacity to regions with lower sensitivity, ensuring accurate and balanced conductivity reconstructions. Extensive evaluations on both simulated and experimental data demonstrate that PhyNC outperforms existing methods in terms of detail preservation and artifact resistance, particularly in low-sensitivity regions. Our approach enhances the robustness of EIT reconstructions and provides a flexible framework that can be adapted to other imaging modalities with similar challenges.

Via

Access Paper or Ask Questions

Partition Map-Based Fast Block Partitioning for VVC Inter Coding

Apr 25, 2025

Xinmin Feng, Zhuoyuan Li, Li Li, Dong Liu, Feng Wu

Abstract:Among the new techniques of Versatile Video Coding (VVC), the quadtree with nested multi-type tree (QT+MTT) block structure yields significant coding gains by providing more flexible block partitioning patterns. However, the recursive partition search in the VVC encoder increases the encoder complexity substantially. To address this issue, we propose a partition map-based algorithm to pursue fast block partitioning in inter coding. Based on our previous work on partition map-based methods for intra coding, we analyze the characteristics of VVC inter coding, and thus improve the partition map by incorporating an MTT mask for early termination. Next, we develop a neural network that uses both spatial and temporal features to predict the partition map. It consists of several special designs including stacked top-down and bottom-up processing, quantization parameter modulation layers, and partitioning-adaptive warping. Furthermore, we present a dual-threshold decision scheme to achieve a fine-grained trade-off between complexity reduction and rate-distortion (RD) performance loss. The experimental results demonstrate that the proposed method achieves an average 51.30% encoding time saving with a 2.12% Bjontegaard Delta Bit Rate (BDBR) under the random access configuration.

* 23 pages, 26 figures. Project page: https://github.com/ustc-ivclab/IPM

Via

Access Paper or Ask Questions

SDEIT: Semantic-Driven Electrical Impedance Tomography

Apr 05, 2025

Dong Liu, Yuanchao Wu, Bowen Tong, Jiansong Deng

Abstract:Regularization methods using prior knowledge are essential in solving ill-posed inverse problems such as Electrical Impedance Tomography (EIT). However, designing effective regularization and integrating prior information into EIT remains challenging due to the complexity and variability of anatomical structures. In this work, we introduce SDEIT, a novel semantic-driven framework that integrates Stable Diffusion 3.5 into EIT, marking the first use of large-scale text-to-image generation models in EIT. SDEIT employs natural language prompts as semantic priors to guide the reconstruction process. By coupling an implicit neural representation (INR) network with a plug-and-play optimization scheme that leverages SD-generated images as generative priors, SDEIT improves structural consistency and recovers fine details. Importantly, this method does not rely on paired training datasets, increasing its adaptability to varied EIT scenarios. Extensive experiments on both simulated and experimental data demonstrate that SDEIT outperforms state-of-the-art techniques, offering superior accuracy and robustness. This work opens a new pathway for integrating multimodal priors into ill-posed inverse problems like EIT.

Via

Access Paper or Ask Questions

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

Feb 25, 2025

María Andrea Cruz Blandón, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, Marcello Federico

Abstract:Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine-grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural nuances. A native approach provides a better representation of the end user experience. In this work, we develop a Multilingual End-to-end Meta-Evaluation RAG benchmark (MEMERAG). Our benchmark builds on the popular MIRACL dataset, using native-language questions and generating responses with diverse large language models (LLMs), which are then assessed by expert annotators for faithfulness and relevance. We describe our annotation process and show that it achieves high inter-annotator agreement. We then analyse the performance of the answer-generating LLMs across languages as per the human evaluators. Finally we apply the dataset to our main use-case which is to benchmark multilingual automatic evaluators (LLM-as-a-judge). We show that our benchmark can reliably identify improvements offered by advanced prompting techniques and LLMs. We will release our benchmark to support the community developing accurate evaluation methods for multilingual RAG systems.

Via

Access Paper or Ask Questions

Query Brand Entity Linking in E-Commerce Search

Feb 03, 2025

Dong Liu, Sreyashi Nag

Abstract:In this work, we address the brand entity linking problem for e-commerce search queries. The entity linking task is done by either i)a two-stage process consisting of entity mention detection followed by entity disambiguation or ii) an end-to-end linking approaches that directly fetch the target entity given the input text. The task presents unique challenges: queries are extremely short (averaging 2.4 words), lack natural language structure, and must handle a massive space of unique brands. We present a two-stage approach combining named-entity recognition with matching, and a novel end-to-end solution using extreme multi-class classification. We validate our solutions by both offline benchmarks and the impact of online A/B test.

Via

Access Paper or Ask Questions

Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval

Jan 30, 2025

Niklas Freymuth, Dong Liu, Thomas Ricatte, Saab Mansour

Figure 1 for Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval

Figure 2 for Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval

Figure 3 for Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval

Figure 4 for Hierarchical Multi-field Representations for Two-Stage E-commerce Retrieval

Abstract:Dense retrieval methods typically target unstructured text data represented as flat strings. However, e-commerce catalogs often include structured information across multiple fields, such as brand, title, and description, which contain important information potential for retrieval systems. We present Cascading Hierarchical Attention Retrieval Model (CHARM), a novel framework designed to encode structured product data into hierarchical field-level representations with progressively finer detail. Utilizing a novel block-triangular attention mechanism, our method captures the interdependencies between product fields in a specified hierarchy, yielding field-level representations and aggregated vectors suitable for fast and efficient retrieval. Combining both representations enables a two-stage retrieval pipeline, in which the aggregated vectors support initial candidate selection, while more expressive field-level representations facilitate precise fine-tuning for downstream ranking. Experiments on publicly available large-scale e-commerce datasets demonstrate that CHARM matches or outperforms state-of-the-art baselines. Our analysis highlights the framework's ability to align different queries with appropriate product fields, enhancing retrieval accuracy and explainability.

Via

Access Paper or Ask Questions

The Gap Between Principle and Practice of Lossy Image Coding

Jan 21, 2025

Haotian Zhang, Dong Liu

Figure 1 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 2 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 3 for The Gap Between Principle and Practice of Lossy Image Coding

Figure 4 for The Gap Between Principle and Practice of Lossy Image Coding

Abstract:Lossy image coding is the art of computing that is principally bounded by the image's rate-distortion function. This bound, though never accurately characterized, has been approached practically via deep learning technologies in recent years. Indeed, learned image coding schemes allow direct optimization of the joint rate-distortion cost, thereby outperforming the handcrafted image coding schemes by a large margin. Still, it is observed that there is room for further improvement in the rate-distortion performance of learned image coding. In this article, we identify the gap between the ideal rate-distortion function forecasted by Shannon's information theory and the empirical rate-distortion function achieved by the state-of-the-art learned image coding schemes, revealing that the gap is incurred by five different effects: modeling effect, approximation effect, amortization effect, digitization effect, and asymptotic effect. We design simulations and experiments to quantitively evaluate the last three effects, which demonstrates the high potential of future lossy image coding technologies.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Multimodal semantic retrieval for product search

Jan 13, 2025

Dong Liu, Esther Lopez Ramos

Abstract:Semantic retrieval (also known as dense retrieval) based on textual data has been extensively studied for both web search and product search application fields, where the relevance of a query and a potential target document is computed by their dense vector representation comparison. Product image is crucial for e-commence search interactions and is a key factor for customers at product explorations. But its impact for semantic retrieval has not been well studied yet. In this research, we build a multimodal representation for product items in e-commerece search in contrast to pure-text representation of products, and investigate the impact of such representations. The models are developed and evaluated on e-commerce datasets. We demonstrate that a multimodal representation scheme for a product can show improvement either on purchase recall or relevance accuracy in semantic retrieval. Additionally, we provide numerical analysis for exclusive matches retrieved by a multimodal semantic retrieval model versus a text-only semantic retrieval model, to demonstrate the validation of multimodal solutions.

Via

Access Paper or Ask Questions

Visual Large Language Models for Generalized and Specialized Applications

Jan 06, 2025

Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan, Anh Dao, Kewei Sui, Jiayi Shen, Dong Liu, Huan Liu, Yu Kong

Figure 1 for Visual Large Language Models for Generalized and Specialized Applications

Figure 2 for Visual Large Language Models for Generalized and Specialized Applications

Figure 3 for Visual Large Language Models for Generalized and Specialized Applications

Figure 4 for Visual Large Language Models for Generalized and Specialized Applications

Abstract:Visual-language models (VLM) have emerged as a powerful tool for learning a unified embedding space for vision and language. Inspired by large language models, which have demonstrated strong reasoning and multi-task capabilities, visual large language models (VLLMs) are gaining increasing attention for building general-purpose VLMs. Despite the significant progress made in VLLMs, the related literature remains limited, particularly from a comprehensive application perspective, encompassing generalized and specialized applications across vision (image, video, depth), action, and language modalities. In this survey, we focus on the diverse applications of VLLMs, examining their using scenarios, identifying ethics consideration and challenges, and discussing future directions for their development. By synthesizing these contents, we aim to provide a comprehensive guide that will pave the way for future innovations and broader applications of VLLMs. The paper list repository is available: https://github.com/JackYFL/awesome-VLLMs.

Via

Access Paper or Ask Questions